<font size = 3>

Visualizing Air Pollution Disparities in New York City

Nick Sawhney

Overview:

Every New Yorker has had a day where the city air just didn't agree with them. Though air pollution in New York has steadily been getting better over the years, it still is impacting the health of many across the city. I wanted to analyze the sources of air pollution within the city and the disparity between sources and effects (edit later). I use the Air Quality dataset from NYC Open Data to visualize the disparities between pollution sources and their health effects. I'll be using the data analysis library pandas for the air quality data, and geopandas for geographic data analysis. For visualization, I'll be using matplotlib for static maps and folium for interactive maps.

Formatting the Data

Let's import the libraries we're using

In [1]:
import pandas as pd
import geopandas as gpd
import folium
import branca
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable

<font size = 3>I'm using DOHMH's Air Quality dataset from NYC Open Data. Let's take a quick look at it.

In [2]:
air_data = pd.read_csv('Air_Quality.csv', encoding='latin1')
air_data
Out[2]:
indicator_data_id indicator_id name Measure geo_type_name geo_entity_id geo_entity_name year_description data_valuemessage
0 130728 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Borough 1 Bronx 2005 2.8
1 130729 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Borough 2 Brooklyn 2005 2.8
2 130730 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Borough 3 Manhattan 2005 4.7
3 130731 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Borough 4 Queens 2005 1.9
4 130732 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Borough 5 Staten Island 2005 1.6
5 130727 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration Citywide 1 New York City 2005 2.9
6 130685 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 101 Kingsbridge - Riverdale 2005 2.9
7 130686 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 102 Northeast Bronx 2005 2.8
8 130687 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 103 Fordham - Bronx Pk 2005 2.7
9 130688 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 104 Pelham - Throgs Neck 2005 2.7
10 130689 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 105 Crotona -Tremont 2005 3.0
11 130690 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 106 High Bridge - Morrisania 2005 3.0
12 130691 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 107 Hunts Point - Mott Haven 2005 2.8
13 130692 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 201 Greenpoint 2005 3.7
14 130693 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 202 Downtown - Heights - Slope 2005 3.7
15 130694 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 203 Bedford Stuyvesant - Crown Heights 2005 2.5
16 130695 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 204 East New York 2005 2.3
17 130696 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 205 Sunset Park 2005 3.2
18 130697 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 206 Borough Park 2005 2.5
19 130698 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 207 East Flatbush - Flatbush 2005 2.3
20 130699 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 208 Canarsie - Flatlands 2005 2.4
21 130700 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 209 Bensonhurst - Bay Ridge 2005 2.8
22 130701 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 210 Coney Island - Sheepshead Bay 2005 2.4
23 130702 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 211 Williamsburg - Bushwick 2005 2.8
24 130703 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 301 Washington Heights 2005 4.1
25 130704 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 302 Central Harlem - Morningside Heights 2005 3.9
26 130705 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 303 East Harlem 2005 4.2
27 130706 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 304 Upper West Side 2005 3.9
28 130707 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 305 Upper East Side 2005 4.5
29 130708 646 Air Toxics Concentrations- Average Benzene Con... Average Concentration UHF42 306 Chelsea - Clinton 2005 4.9
... ... ... ... ... ... ... ... ... ...
2739 151731 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 206 Borough Park 2005 1.1
2740 151732 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 207 East Flatbush - Flatbush 2005 0.9
2741 151733 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 208 Canarsie - Flatlands 2005 0.4
2742 151734 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 209 Bensonhurst - Bay Ridge 2005 1.0
2743 151735 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 210 Coney Island - Sheepshead Bay 2005 0.6
2744 151736 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 211 Williamsburg - Bushwick 2005 0.5
2745 151737 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 301 Washington Heights 2005 1.6
2746 151738 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 302 Central Harlem - Morningside Heights 2005 0.9
2747 151739 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 303 East Harlem 2005 2.1
2748 151740 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 304 Upper West Side 2005 1.9
2749 151741 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 305 Upper East Side 2005 2.9
2750 151742 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 306 Chelsea - Clinton 2005 3.4
2751 151743 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 307 Gramercy Park - Murray Hill 2005 4.4
2752 151744 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 308 Greenwich Village - SoHo 2005 3.8
2753 151745 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 309 Union Square - Lower East Side 2005 3.2
2754 151746 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 310 Lower Manhattan 2005 1.6
2755 151747 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 401 Long Island City - Astoria 2005 1.2
2756 151748 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 402 West Queens 2005 2.0
2757 151749 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 403 Flushing - Clearview 2005 1.2
2758 151750 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 404 Bayside - Little Neck 2005 2.0
2759 151751 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 405 Ridgewood - Forest Hills 2005 1.0
2760 151752 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 406 Fresh Meadows 2005 2.1
2761 151753 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 407 Southwest Queens 2005 1.4
2762 151754 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 408 Jamaica 2005 0.6
2763 151755 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 409 Southeast Queens 2005 0.5
2764 151756 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 410 Rockaways 2005 0.2
2765 151757 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 501 Port Richmond 2005 0.3
2766 151758 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 502 Stapleton - St. George 2005 0.8
2767 151759 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 503 Willowbrook 2005 0.8
2768 151760 645 Traffic Density- Annual Vehicle Miles Traveled... Per 100 km2 UHF42 504 South Beach - Tottenville 2005 0.3

2769 rows × 9 columns

<font size = 3>This dataset has a name, measure type, and year/time period, and region for each measure. I want to make sure that I'm only working with neighborhood-level data, so I create a column that combines all the details of the measurement into one description. That way, I can have a unique column for each measurement in each region. I also remove Boroguh and Citywide data from the set because we only care about neighborhood-level data for this experiement, and hospitalization data from 2005-2007, since our average pollutant levels are all 2009-2010

In [3]:
air_data = air_data[~air_data['name'].str.startswith('Traffic Density')]
air_data = air_data[~air_data['year_description'].str.contains('2005-2007')]
air_data = air_data[air_data['geo_type_name'] == 'UHF42'].reset_index()
air_data['pollutant'] = air_data['name'] + ', ' + air_data['year_description'] + ', ' + air_data['Measure']

<font size =3 >Next, I'm going to import the geographic data I need to create the maps. I'm using GIS data provided by NYC DOH which shows each "United Health Fund" neighborhood's boundaries, which is the most granular regional designation in the Air Quality dataset.

First I clean the first row so pandas can handle the data, and I group the Air Quality dataset by region. I noticed that there were some small differences in how certain neighborhoods were written down (e.g. 'Crotona -Tremont' vs 'Crotona - Tremont'), so I use python's handy dictionary comprehensions and a pandas method to fix it. I also group the pollutant types for use later.

I merge the air quality dataset into the neighborhoods dataset based on the neighborhood.

In [4]:
neighborhoods = gpd.read_file('UHF_42_DOHMH_2009').dropna()
neighborhoods['UHF_NEIGH'][0] ='None'
neighborhoods = neighborhoods.set_index('UHF_NEIGH')
neighborhood_groups = air_data.groupby('geo_entity_name').groups
air_data = air_data.replace({x: y for x, y in zip(sorted(list(neighborhood_groups.keys())), sorted(list(neighborhoods.index))) if x!=y})
pollutant_groups = air_data.groupby('pollutant').groups
pollutants = [group for group in pollutant_groups]

for group in pollutant_groups:
    neighborhoods = pd.concat([neighborhoods, air_data.loc[pollutant_groups[group]].drop('index', axis=1).drop_duplicates().set_index('geo_entity_name')['data_valuemessage']], axis = 1, join_axes = [neighborhoods.index]).rename({'data_valuemessage': group}, axis=1)
    
neighborhoods = neighborhoods.reset_index()  

<font size =3> The final dataset should have everything we need to start exploring! As you can see, for each neighborhood we have geographic location and geometry data, as well as measurements for each pollutant. Now we can get to the fun part.

In [5]:
neighborhoods
Out[5]:
UHF_NEIGH OBJECTID UHFCODE SHAPE_Leng SHAPE_Area BOROUGH geometry Air Toxics Concentrations- Average Benzene Concentrations, 2005, Average Concentration Air Toxics Concentrations- Average Formaldehyde Concentrations, 2005, Average Concentration Boiler Emissions- Total NOx Emissions, 2013, Per km2 ... O3-Attributable Asthma ED Visits, 2009-2011, Rate- 18 Yrs and Older O3-Attributable Asthma ED Visits, 2009-2011, Rate- Children 0 to 17 Yrs Old O3-Attributable Asthma Hospitalizations , 2009-2011, Rate- 18 Yrs and Older O3-Attributable Asthma Hospitalizations , 2009-2011, Rate- Children 0 to 17 Yrs Old O3-Attributable Cardiac and Respiratory Deaths , 2009-2011, Rate PM2.5-Attributable Asthma ED Visits , 2009-2011, Rate- 18 Yrs and Older PM2.5-Attributable Asthma ED Visits , 2009-2011, Rate- Children 0 to 17 Yrs Old PM2.5-Attributable Cardiovascular Hospitalizations (Adults 40 Yrs and Older) , 2009-2011, Rate- 40 Years and Older PM2.5-Attributable Deaths , 2009-2011, Rate - Adults 30 Yrs and Older PM2.5-Attributable Respiratory Hospitalizations (Adults 20 Yrs and Older), 2009-2011, Rate- Adults 20 Yrs and Older
0 Kingsbridge - Riverdale 2 101.0 57699.154353 1.332914e+08 Bronx POLYGON ((1017992.893460184 269222.9642282724,... 2.9 3.2 42.5 ... 26.5 66.2 7.5 22.3 8.6 25.4 60.7 18.3 77.6 18.6
1 Northeast Bronx 3 102.0 88219.319109 1.813708e+08 Bronx POLYGON ((1025012.990312353 270794.260298267, ... 2.8 3.2 33.8 ... 42.2 79.3 9.8 32.2 5.4 37.6 73.4 16.2 57.0 18.6
2 Fordham - Bronx Park 4 103.0 59711.871991 1.407724e+08 Bronx POLYGON ((1023994.479554102 261065.9674189389,... 2.7 3.2 71.0 ... 73.6 125.0 14.4 43.6 4.0 68.7 122.8 18.8 49.6 20.5
3 Pelham - Throgs Neck 5 104.0 250903.372273 3.865737e+08 Bronx (POLYGON ((1017075.038996696 237316.1822414398... 2.7 3.2 24.9 ... 58.1 116.0 11.7 34.2 4.5 57.5 121.9 19.3 54.5 19.7
4 Crotona - Tremont 6 105.0 66676.089072 1.068978e+08 Bronx POLYGON ((1007916.255148947 252530.7522059381,... 3.0 3.4 62.5 ... 127.4 183.9 18.9 39.6 2.7 131.3 200.2 21.6 47.4 25.1
5 High Bridge - Morrisania 7 106.0 50241.649324 8.589956e+07 Bronx POLYGON ((1007936.999858111 247674.9997783601,... 3.0 3.6 78.0 ... 113.1 202.0 16.4 38.2 2.8 130.1 249.3 24.1 55.8 27.6
6 Hunts Point - Mott Haven 8 107.0 62182.674773 1.128093e+08 Bronx POLYGON ((1014093.000013277 243014.9999120235,... 2.8 3.5 36.8 ... 119.1 206.3 16.0 43.6 3.1 138.1 251.3 25.0 58.8 30.2
7 Greenpoint 9 201.0 48036.348146 1.047836e+08 Brooklyn POLYGON ((1005218.260544941 199987.1578182727,... 3.7 3.5 18.9 ... 26.9 58.6 3.5 10.3 3.6 28.2 59.6 20.5 45.6 10.0
8 Downtown - Heights - Slope 10 202.0 111232.015567 1.758001e+08 Brooklyn POLYGON ((992506.9700226933 196757.6035818607,... 3.7 3.2 32.0 ... 43.5 95.3 6.6 20.5 4.2 40.7 89.1 19.8 40.5 13.1
9 Bedford Stuyvesant - Crown Heights 11 203.0 69711.014325 1.662263e+08 Brooklyn POLYGON ((999487.0000258535 190339.9996191859,... 2.5 3.0 31.7 ... 105.4 191.4 13.5 40.2 4.8 95.4 178.4 23.9 55.9 16.3
10 East New York 12 204.0 75378.418333 1.551405e+08 Brooklyn POLYGON ((1021273.999758855 188676.9999924302,... 2.3 2.8 14.7 ... 95.0 164.0 11.5 29.8 3.5 85.0 153.2 22.4 45.9 13.9
11 Sunset Park 13 205.0 100114.866691 1.118635e+08 Brooklyn (POLYGON ((984461.1002876908 183641.4342761934... 3.2 3.2 13.4 ... 39.1 59.5 6.5 10.8 4.1 35.0 51.3 17.1 38.7 12.5
12 Borough Park 14 206.0 63216.738141 1.718763e+08 Brooklyn POLYGON ((991521.9998797774 178418.9998391867,... 2.5 2.9 34.8 ... 15.4 22.7 3.8 5.1 5.3 13.2 19.5 19.9 50.3 11.8
13 East Flatbush - Flatbush 15 207.0 73949.350573 1.937373e+08 Brooklyn POLYGON ((994544.0000693649 181118.0000185966,... 2.3 2.8 33.5 ... 65.9 129.2 7.3 26.9 3.9 57.9 115.8 18.5 43.2 10.5
14 Canarsie - Flatlands 16 208.0 142322.645199 3.598796e+08 Brooklyn (POLYGON ((1012385.999818355 178229.0002191812... 2.4 2.8 6.7 ... 49.8 97.7 7.1 21.5 5.7 40.9 81.7 18.3 49.7 12.5
15 Bensonhurst - Bay Ridge 17 209.0 73089.457275 1.603987e+08 Brooklyn POLYGON ((979545.9998816848 169007.9996323586,... 2.8 2.7 26.5 ... 13.7 24.9 2.9 5.0 7.2 11.3 20.7 16.5 53.6 10.4
16 Coney Island - Sheepshead Bay 18 210.0 101869.349860 2.288309e+08 Brooklyn POLYGON ((1001091.00009869 161590.0000326931, ... 2.4 2.5 23.7 ... 24.0 42.3 6.0 8.1 8.9 19.8 35.1 22.3 63.4 15.4
17 Williamsburg - Bushwick 19 211.0 49012.822810 1.066612e+08 Brooklyn POLYGON ((1004672.999824435 199428.9998138547,... 2.8 3.1 27.8 ... 113.7 185.8 13.6 41.2 3.7 106.8 184.6 26.9 50.0 18.2
18 Washington Heights - Inwood 20 301.0 67477.648187 9.890333e+07 Manhattan (POLYGON ((1005425.14234902 249641.0621191859,... 4.1 3.9 115.3 ... 41.7 130.2 5.8 18.3 3.2 44.5 138.5 18.1 46.0 12.5
19 Central Harlem - Morningside Heights 21 302.0 50891.462079 6.139621e+07 Manhattan POLYGON ((1002429.339941695 239601.3318706006,... 3.9 4.2 82.1 ... 130.4 269.9 10.3 38.1 4.3 137.0 291.1 19.7 63.6 18.8
20 East Harlem 22 303.0 53979.192786 6.089727e+07 Manhattan (POLYGON ((1006592.462192863 230451.8134242743... 4.2 4.2 55.8 ... 135.2 276.0 17.4 42.3 4.7 147.1 299.4 23.3 69.9 29.2
21 Upper West Side 23 304.0 46554.865062 5.827181e+07 Manhattan POLYGON ((993980.9998830259 233184.0002171099,... 3.9 4.0 247.9 ... 24.1 69.8 2.8 11.5 4.1 25.5 74.2 12.0 48.9 11.6
22 Upper East Side 24 305.0 49821.036696 5.438999e+07 Manhattan (POLYGON ((995360.0099202693 212727.9131765962... 4.5 4.5 269.8 ... 9.3 35.4 1.4 9.4 4.0 9.4 35.5 11.1 45.3 9.2
23 Chelsea - Clinton 25 306.0 87605.442609 7.943351e+07 Manhattan POLYGON ((986094.9999070317 220802.0000467747,... 4.9 4.6 204.8 ... 24.6 91.6 2.8 14.1 3.4 25.3 88.6 11.7 39.7 9.0
24 Gramercy Park - Murray Hill 26 307.0 38475.197152 4.828882e+07 Manhattan POLYGON ((994255.7896877825 213287.3133045137,... 5.3 5.3 284.7 ... 19.6 70.0 2.1 10.6 3.6 20.7 69.7 12.9 47.9 8.6
25 Greenwich Village - Soho 27 308.0 50948.825728 4.023911e+07 Manhattan POLYGON ((980850.7403824478 209925.413532272, ... 4.7 4.0 132.5 ... 8.9 24.2 1.0 6.2 3.4 9.2 24.5 9.6 41.0 5.4
26 Union Square - Lower East Side 28 309.0 34713.209340 5.732039e+07 Manhattan POLYGON ((987085.9998935312 208579.0002146065,... 5.0 4.0 126.1 ... 42.5 144.0 5.6 14.2 4.0 45.5 151.9 18.5 58.8 15.5
27 Lower Manhattan 29 310.0 50752.735123 3.513227e+07 Manhattan (POLYGON ((979738.8600602746 191848.9040816873... 6.3 4.2 118.7 ... 27.0 56.8 2.6 9.5 2.8 25.7 57.2 14.0 36.4 5.8
28 Long Island City - Astoria 30 401.0 72324.762815 1.923020e+08 Queens POLYGON ((1010873.509405032 223071.2428836077,... 2.7 3.6 30.6 ... 27.1 75.0 4.1 13.2 3.5 28.1 79.4 15.6 39.8 12.3
29 West Queens 31 402.0 116570.277413 3.246549e+08 Queens (POLYGON ((1015258.111856282 219212.6315063536... 2.2 3.4 24.6 ... 16.1 72.4 3.0 9.2 3.0 15.6 75.2 13.2 33.2 9.6
30 Flushing - Clearview 32 403.0 125334.924243 3.622995e+08 Queens (POLYGON ((1028750.298782602 218048.263100192,... 1.9 2.7 18.7 ... 10.5 40.0 2.6 11.2 5.4 9.3 38.0 14.1 43.6 11.3
31 Bayside - Little Neck 33 404.0 70215.833085 2.145688e+08 Queens POLYGON ((1057651.613027111 221022.1423230171,... 1.9 2.5 10.4 ... 7.8 27.5 2.2 8.5 4.5 6.7 22.1 12.5 34.5 10.2
32 Ridgewood - Forest Hills 34 405.0 87164.991021 2.652703e+08 Queens POLYGON ((1027432.99990803 206826.0001116842, ... 2.0 3.0 23.6 ... 16.6 56.3 3.4 16.0 5.5 14.9 51.1 15.7 45.7 12.4
33 Fresh Meadows 35 406.0 61672.825322 1.534321e+08 Queens POLYGON ((1045675.000169352 212142.0003064424,... 1.7 2.3 15.3 ... 20.7 60.9 3.7 13.6 4.8 17.2 47.7 14.9 43.4 9.7
34 Southwest Queens 36 407.0 123610.729914 2.725265e+08 Queens (POLYGON ((1024164.528754696 176334.7244801819... 1.8 2.4 14.5 ... 35.0 89.0 4.1 18.6 4.0 29.8 80.0 19.9 37.1 10.5
35 Jamaica 37 408.0 123620.448330 3.645499e+08 Queens (POLYGON ((1051427.401722282 176505.7694136798... 1.8 2.4 13.2 ... 54.1 144.7 7.5 28.3 5.2 41.6 115.8 18.9 42.7 12.7
36 Southeast Queens 38 409.0 129288.280297 3.616017e+08 Queens POLYGON ((1063633.546273187 216153.5254198462,... 1.8 2.4 8.1 ... 35.5 90.2 5.0 26.2 4.3 27.4 70.5 14.5 34.4 10.0
37 Rockaway 39 410.0 232823.135817 2.657288e+08 Queens (POLYGON ((1048934.908019602 163138.5048096925... 1.4 2.0 6.1 ... 59.7 137.4 9.0 23.6 9.0 47.8 111.8 19.8 69.8 18.8
38 Port Richmond 40 501.0 86322.590205 1.645195e+08 Staten Island POLYGON ((954321.1448196918 174584.4950410128,... 2.5 2.4 2.8 ... 84.7 118.8 12.1 21.2 5.7 64.5 92.0 16.4 49.1 16.6
39 Stapleton - St. George 41 502.0 107053.886650 3.272437e+08 Staten Island POLYGON ((962625.4608881921 175742.8644434363,... 1.3 2.3 4.7 ... 58.1 101.7 9.5 16.0 8.4 42.9 76.1 16.9 54.3 17.3
40 Willowbrook 42 503.0 117827.805092 4.131080e+08 Staten Island (POLYGON ((928508.201695025 163120.8847661912,... 1.6 2.3 2.1 ... 22.3 47.3 4.7 10.0 9.7 16.0 32.8 17.8 58.3 13.5
41 South Beach - Tottenville 43 504.0 175067.708940 7.311307e+08 Staten Island (POLYGON ((927254.1018410325 148338.3352631032... 1.1 2.2 2.0 ... 15.9 29.8 4.2 7.2 7.3 11.8 21.0 15.2 48.0 11.7

42 rows × 28 columns

<font size = 3>

Lets Make Some Maps!

I'm going to need this helper function to add line breaks to the charts

In [6]:
#Thanks to Mohammad ElNesr on StackOverflow for this function!
def split_title_line(title_text, split_on='(', max_words=5):  # , max_words=None):
    """
    A function that splits any string based on specific character
    (returning it with the string), with maximum number of words on it
    """
    split_at = title_text.find (split_on)
    ti = title_text
    if split_at > 1:
        ti = ti.split (split_on)
        for i, tx in enumerate (ti[1:]):
            ti[i + 1] = split_on + tx
    if type (ti) == type ('text'):
        ti = [ti]
    for j, td in enumerate (ti):
        if td.find (split_on) > 0:
            pass
        else:
            tw = td.split ()
            t2 = []
            for i in range (0, len (tw), max_words):
                t2.append (' '.join (tw[i:max_words + i]))
            ti[j] = t2
    ti = [item for sublist in ti for item in sublist]
    ret_tex = []
    for j in range (len (ti)):
        for i in range(0, len(ti)-1, 2):
            if len (ti[i].split()) + len (ti[i+1].split ()) <= max_words:
                mrg = " ".join ([ti[i], ti[i+1]])
                ti = [mrg] + ti[2:]
                break

    if len (ti[-2].split ()) + len (ti[-1].split ()) <= max_words:
        mrg = " ".join ([ti[-2], ti[-1]])
        ti = ti[:-2] + [mrg]
    return '\n'.join (ti)

<font size =3> Thanks to geopandas, I can treat geographic data like any other dataframe, so it's relatively simple to plot heatmaps of all pollutants and health impacts. I inverted the color of Ozone Concentration because unlike other pollutants, it is the lack of Ozone that causes negative health impacts. Every map is shown below, and I'll be going into some interesting findings next!

Keep in mind, the colors you see on each map are all scaled differently because the indicators all have different impacts, i.e. a "high" concentration of CO2 is different than a "high" concentration of PM2.5 (a common industrial pollutant).

In [7]:
plt.rcParams['axes.titlesize'] = 20
fig, axs = plt.subplots(nrows=11, ncols=2, figsize=(30, 165), dpi=100)
for axis, pollutant in enumerate(pollutants):
    row = axis // 2
    col = axis % 2
    cm = 'viridis'
    if('Ozone' in pollutant and 'Concentration' in pollutant):
        cm = 'viridis_r'
    divider = make_axes_locatable(axs[row][col])
    cax = divider.append_axes("right", size="5%", pad=0.1)
    axs[row][col] = neighborhoods.plot(ax=axs[row][col], cmap = cm, column = pollutant, legend = True, cax=cax)
    axs[row][col].set_title(split_title_line(pollutant, max_words = 5))
    axs[row][col].axis('off')
axs[-1][-1].axis('off')
plt.savefig('figs/master.png') 
plt.show()

There's quite a bit of very interesting data here! It makes a lot of sense that the highest concentration of pollutant output or Ozone depletion happens in or around middle and lower Manhattan. These are areas with a lot of traffic congestion, close to the ConEd factory, and densely populated by people, industry, and vehicles. What's more interesting is when you compare the location of a pollutant to the location of its health effects.

Analysis

Not only do we have data on many different types of pollutants, but we also have data on hospitalizations, Emergency Room visits, and deaths related to low Ozone levels and PM2.5 (which refers to levels of very small particles in the air that come from a wide range of industrial and natural processes). PM2.5 in particular can cause many respiratory, cartiovascular, and immune system problems.

In [8]:
plt.rcParams['axes.titlesize'] = 20
pm25 = [6, 16, 18, 19]

#axs[0][0] = neighborhoods.plot(ax = axs[0][0], column = pollutants[6])
#axs[0][0].set_title(split_title_line(pollutants[6], max_words = 5))
#axs[0][0].axis('off')
def plot_four(pol):
    f, axs = plt.subplots(figsize = (30, 30), dpi = 100, squeeze = False, nrows = 2, ncols = 2)
    for i, p in enumerate(pol):
        #axs[i][0].axis('off')
        row = i // 2
        col = i % 2
        cm = 'viridis'
        if('Ozone' in pollutants[pol[i]] and 'Concentration' in pollutants[pol[i]]):
            cm = 'viridis_r'
        axs[row][col] = neighborhoods.plot(cmap = cm, ax = axs[row][col], column = pollutants[p], legend = True)
        axs[row][col].set_title(split_title_line(pollutants[pol[i]], max_words = 5))
        axs[row][col].axis('off')
    plt.show()

plot_four(pm25)

<font size = 3> On the top left is neighborhood concentrations of PM2.5 patricles. The other maps show where people were hospitalized or passed way for any reason attributable to PM2.5. As you can see, there's quite a big disparity between the source-location of the pollutant and the areas most affected by it. There could be a variety of reasons for this. Maybe it's because many people in outer boroughs commute to industrial parts of Manhattan to work. It could be due to wind patterns, or socioeconomic circumstances limiting access to health. An even more stark disparity can be seen with Ozone levels. It could also be that these kinds of health problems simply do not happen enough to be statistically significant

In [9]:
oz = [i for i in range(9, 16, 2)]
plot_four(oz)

<font size = 3>

Aggregation and Interactive Visualization

Next, let's put all the pollution information together into an interactive map! Since each graph has a different scale, we need to figure out a way to put all the information together without losing data. We'll split the maps in two - one with the concentration of various pollutants, and one with the health effects.

In [10]:
pollutant_concentration = [pol for pol in pollutants if 'Attributable' not in pol and 'Ozone' not in pol]
health_effects = [pol for pol in pollutants if pol not in pollutant_concentration and ('Ozone' not in pol and 'O3' not in pol) ]

Next, we need to aggregate the data in a way that doesn't drown out the naturally lower ranges of concentration of some pollutants (for example, a 20% concentration of Sulfur Dioxide is a much more significant amount than a 20% concentration of Nitrogen Dioxide). Therefore, we will scale all the pollutants to be some value between 0 and 1, with 0 representing the lowest measured amount of that pollutant and 1 representing the highest. Then, we average all the normalized pollutant values. This is a nice way to get a good picture of the data as a whole, but remember that the coloring of the map is relative, showing the difference between the most and least polluted neighborhoods of New York, rather than the general amount of pollution in each neighborhood.

In [11]:
def normalize(df):
    return (df - df.min()) / (df.max() - df.min())
In [12]:
neighborhoods['avg_health_impact'] = sum(normalize(neighborhoods[column]) for column in health_effects)/len(health_effects)
neighborhoods['avg_concentration'] = sum([normalize(neighborhoods[column]) for column in pollutant_concentration])/len(pollutant_concentration)

<font size = 3>

Pollutant Concentration Map

Here is an interactive map of the average concentration that we just calculated.

In [13]:
m = folium.Map([40.730610, -73.935242], tiles = 'CartoDB positron')

m.add_child(folium.Choropleth(
        geo_data = neighborhoods.dropna(),
        name = 'pollution',
        data = neighborhoods.dropna(),
        columns = ['UHF_NEIGH', 'avg_concentration'],
        key_on='feature.properties.UHF_NEIGH',
        fill_color = 'YlOrRd',
        fill_opacity=.9,
        line_opacity=.4,
    )
)
Out[13]:
In [14]:
m = folium.Map([40.730610, -73.935242], tiles = 'CartoDB positron')

m.add_child(folium.Choropleth(
        geo_data = neighborhoods.dropna(),
        name = 'pollution',
        data = neighborhoods.dropna(),
        columns = ['UHF_NEIGH', 'avg_health_impact'],
        key_on='feature.properties.UHF_NEIGH',
        fill_color = 'BuPu',
        fill_opacity=.9,
        line_opacity=.4,
    )
)
Out[14]:

What have we learned?

With these two maps, we can see where there are relatively high levels of some kind of Air Pollutant, and in general which parts of the city feel the health effects of these pollutants the most. What happens in one part of New York surely affects the rest, often in ways that aren't obvious to us!